Language modeling of Chinese personal names based on character units for continuous Chinese speech recognition

نویسندگان

  • Xinhui Hu
  • Hirofumi Yamamoto
  • Gen-ichiro Kikui
  • Yoshinori Sagisaka
چکیده

In this paper, we analyze Chinese personal names to model their statistical phonotactic characteristics for continuous Chinese speech recognition. The analysis showed languagespecific characteristics of Chinese personal names and strongly suggested the advantage of character-unit oriented modeling. A hierarchical language model was composed by reflecting statistical phonotactic characteristics of Chinese personal names as a lower intra-word model, and ordinary inter-word neighboring characteristics as an upper multi-class composite N-gram model. These two layers of models were trained independently using different language corpora. For the modeling of given names, the syllable without tone information was selected as the unit for training the bi-gram. The properties of either one or two characters of a given name were introduced to simplify the length constraint of the modeling process. For Chinese family names, we simply added them directly in the recognition lexicon, since their numbers are very restricted. The results from Chinese speech recognition experiments revealed that the proposed hierarchical language model greatly improved the identification accuracy of the Chinese given names compared with the conventional wordclass N-gram model.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Acoustic modeling and language modeling for cantonese LVCSR

This paper describes our recent work on the development of a large-vocabulary, speaker-independent continuous speech recognition system for Cantonese (a major Chinese dialect). Both acoustic modeling and language modeling are being addressed. For acoustic modeling, we focus on right-context-dependent sub-syllable units. Tying of HMM at model as well as state level is applied based on phonetic k...

متن کامل

Improved Large Vocabulary Continuous Chinese Speech Recognition by Character-Based Consensus Networks

Word-based consensus networks have been verified to be very useful in minimizing word error rates (WER) for large vocabulary continuous speech recognition for western languages. By considering the special structure of Chinese language, this paper points out that character-based rather then wordbased consensus networks should work better for Chinese language. This was verified by extensive exper...

متن کامل

Modeling context-dependent phonetic units in a continuous speech recognition system for Mandarin Chinese

We study the problem of phonetic modeling for continuous Mandarin speech recognition by providing a systematic performance comparison for systems based on following primitive speech units: syllable, demi-syllable (Initials and Finals), context-independent phones, left-or-right context-dependentphones (diphones), and leftand-right context-dependent phones (triphones). In our speakerdependent con...

متن کامل

Improved context-dependent acoustic modeling for continuous Chinese speech recognition

This paper describes the new framework of context-dependent (CD) Initial/Final (IF) acoustic modeling using the decision tree based state tying for continuous Chinese speech recognition. The Extended Initial/Final (XIF) set is chosen as the basic speech recognition unit (SRU) set according to the Chinese language characteristics, which outperforms the standard IF set. An adaptive mixture increa...

متن کامل

Automatic speech recognition of Cantones

This paper describes our recent work on the development of a largevocabulary, speaker-independent, continuous speech recognition system for Cantonese-English code-mixing utterances. The details of both acoustic modeling and language modeling will be discussed. For acoustic modeling, Cantonese accents in English words are handled by applying cross-lingual acoustic units, as well as modifications...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006